Goto

Collaborating Authors

 cosine angle


Hyperplane Arrangements of Trained ConvNets Are Biased

Gamba, Matteo, Carlsson, Stefan, Azizpour, Hossein, Björkman, Mårten

arXiv.org Artificial Intelligence

In recent years, understanding and interpreting the inner workings of deep networks has drawn considerable attention from the community [7, 15, 16, 13]. One long-standing question is the problem of identifying the inductive bias of state-of-the-art networks and the form of implicit regularization that is performed by the optimizer [22, 31, 2] and possibly by natural data itself [3]. While earlier studies focused on the theoretical expressivity of deep networks and the advantage of deeper representations [20, 25, 26], a recent trend in the literature is the study of the effective capacity of trained networks [31, 32, 9, 10]. In fact, while state-of-the-art deep networks are largely overparametrized, it is hypothesized that the full theoretical capacity of a model might not be realized in practice, due to some form of self-regulation at play during learning. Some recent works have, thus, tried to find statistical bias consistently present in trained state-of-the-art models that is interpretable and correlates well with generalization [14, 24]. In this work, we take a geometrical perspective and look for statistical bias in the weights of trained convolutional networks, in terms of hyperplane arrangements induced by convolutional layers with ReLU activations.


Word2Vec

#artificialintelligence

Word2Vec is a Two Layer Neural Network based Continuous bag of word (CBOW) and Skip-gram architecture that captures the semantic information. It generates the word embedding (mapping of words in a vector space) for a given text corpus. It converts the words into vectors and vectors performs an operation like add, subtract, calculating distance, etc. which preserves the relationship among the words. How are the relationships among words are formed? Word2Vec assigns similar vector representation to the similar words.


Understanding Softmax Confidence and Uncertainty

Pearce, Tim, Brintrup, Alexandra, Zhu, Jun

arXiv.org Machine Learning

It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution. Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this, e.g., out-of-distribution (OOD) detection. This paper investigates this contradiction, identifying two implicit biases that do encourage softmax confidence to correlate with epistemic uncertainty: 1) Approximately optimal decision boundary structure, and 2) Filtering effects of deep networks. It describes why low-dimensional intuitions about softmax confidence are misleading. Diagnostic experiments quantify reasons softmax confidence can fail, finding that extrapolations are less to blame than overlap between training and OOD data in final-layer representations. Pre-trained/fine-tuned networks reduce this overlap.


Perceptron -- Deep Learning Basics – Hacker Noon

#artificialintelligence

Perceptron is a fundamental unit of the neural network which takes weighted inputs, process it and capable of performing binary classifications. In this post, we will discuss the working of the Perceptron Model. This is a follow-up blog post to my previous post on McCulloch-Pitts Neuron. In 1958 Frank Rosenblatt proposed the perceptron, a more generalized computational model than the McCulloch-Pitts Neuron. The important feature in the Rosenblatt proposed perceptron was the introduction of weights for the inputs.


Information retrieval document search using vector space model in R

@machinelearnbot

Note, there are many variations in the way we calculate the term-frequency(tf) and inverse document frequency (idf), in this post we have seen one variation. Below images show as the other recommended variations of tf and idf, taken from wiki. Mathematically, closeness between two vectors is calculated by calculating the cosine angle between two vectors. In similar lines, we can calculate cosine angle between each document vector and the query vector to find its closeness. To find relevant document to the query term, we may calculate the similarity score between each document vector and the query term vector by applying cosine similarity .


Estimating the coefficients of a mixture of two linear regressions by expectation maximization

Klusowski, Jason M., Yang, Dana, Brinda, W. D.

arXiv.org Machine Learning

The Expectation-Maximization (EM) algorithm is a widely used technique for parameter estimation. It is an iterative procedure that monotonically increases the likelihood. When the likelihood is not concave, it is well known that EM can converge to a non-global optimum. However, recent work has sidestepped the question of whether EM reaches the likelihood maximizer, instead by directly working out statistical guarantees on its loss. These 1 explorations have identified regions of initialization for which the EM estimate approaches the true parameter in probability, assuming the model is well-specified. This line of research was spurred by [1] which established general conditions for which a ball centered at the true parameter would be a basin of attraction for the population version of the EM operator. For a large enough sample size, the difference (in that ball) between the sample EM operator and the population EM operator can be bounded such that the EM estimate approaches the true parameter with high probability. That bound is the sum of two terms with distinct interpretations.